TF port of the Segment Anything Model (SAM) #22970

Rocketknight1 · 2023-04-24T17:37:10Z

This is a first draft of the SAM port - will update this PR as I port tests and make sure everything is working okay. It's also a first proof-of-concept for full GPT-4 auto-translation from PyTorch: The entire modeling_tf_sam.py file was converted from PyTorch by GPT-4 with the exception of the imports at the top, because I haven't written a prompt for those yet.

Update: I checked over all of the code and fixed the issues in the GPT port. Equivalence tests all look good! This is almost ready to merge, but there are a few small issues left:

Get saved model creation working and re-enable tests (problem with the serving signature)
Check for duplication in the processor files - I can probably refactor and simplify things a bit
Refactor convolutions - channels_first doesn't actually work on CPU in TF

HuggingFaceDocBuilderDev · 2023-04-24T17:57:32Z

The documentation is not available anymore as the PR was closed or merged.

sayakpaul · 2023-04-26T07:00:56Z

src/transformers/tf_utils.py

+def flatten(input, start_dim=0, end_dim=-1):
+    # Replicates the behavior of torch.flatten in TF
+
+    # If end_dim or start_dim is negative, count them from the end
+    if end_dim < 0:
+        end_dim += input.shape.rank
+    if start_dim < 0:
+        start_dim += input.shape.rank
+
+    if start_dim == end_dim:
+        return input
+
+    in_shape = tf.shape(input)
+    flattened_dim = tf.math.reduce_prod(in_shape[start_dim : end_dim + 1])
+    out_shape = tf.concat([in_shape[:start_dim], [flattened_dim], in_shape[end_dim + 1 :]], axis=0)
+    return tf.reshape(input, out_shape)


I have no idea why I didn't do this before now!

sayakpaul · 2023-04-26T07:02:01Z

src/transformers/models/sam/image_processing_sam.py


        return output_masks

+    def post_process_masks_tf(


Have we started including separate post-processing ops in native TensorFlow? I thought they were NumPy only. This is indeed nice.

I wasn't sure about this - there's probably some code duplication in the processor I can remove.

Preprocessing are all in numpy - this hasn't been extended to postprocessing methods yet. Mainly because I haven't dared tackle torch.nn.functional.interpolate; partly because we haven't needed to yet.

That said - please don't have post_processing_xxx_tf! We don't use decode_tf for our tokenizers ;)

Could you rework the methods so there's a single post_process_xxx method and hidden framework-specifc methods? i.e.

def post_process_masks(self, masks, ...,): if is_torch_tensor(masks): return self._post_process_masks_pt(...) if is_tf_tensor(masks): return self._post_process_masks_tf(...) ...

Sure! And sorry - I basically rushed through the processor code so I could get to the bit I was hype about (benchmarking GPT-4's translations)

Rocketknight1 · 2023-05-02T15:57:16Z

This is now almost ready to go and the code should be ready for review! Remaining issues:

I added a tol parameter to the TF-PT equivalence test - 1e-5 is too low for SAM (errors are more like 1e-4, but I used 5e-4 in the test to avoid flakiness). This will require a couple of minor tweaks in other models that are calling that test.
Cleanup/refactor in the processor, there's probably some code duplication that I can remove.

sgugger

Why are there two different processing files, one of them not being imported everywhere?
The common tests should not be changed to have a higher tolerance, just override the right tests in proper test file.

Also cc @amyeroberts since you reviewed the PyTorch model extensively.

sgugger · 2023-05-02T17:10:45Z

src/transformers/models/sam/image_processing_sam.py

Why have two functions that do the exact same thing?

Resolved as part of the general processor refactor!

sgugger · 2023-05-02T17:11:19Z

src/transformers/models/sam/image_processing_sam.py

Seems like this is leftover from debugging...

Resolved as part of the general processor refactor! (also oops, sorry)

sgugger · 2023-05-02T17:12:55Z

src/transformers/models/sam/processing_tf_sam.py

What is the purpose of this file?

Shh, it's gone now. We don't talk about processing_tf_sam

sgugger · 2023-05-02T17:14:25Z

tests/models/sam/test_processor_tf_sam.py

Why have a separate test file to test the same class?

Also gone now!

sgugger · 2023-05-02T17:14:45Z

tests/models/data2vec/test_modeling_tf_data2vec_vision.py

Why change the tolerance for this model?

Adding a tolerance argument to the base tests triggered the test to run in other models, which caused this test to fail. I'll investigate and see if it's necessary, though!

sgugger · 2023-05-02T17:15:47Z

src/transformers/models/sam/modeling_tf_sam.py

Can we use more descriptive variable names?

This was copied straight from the PyTorch code, but on reflection I could probably refactor the whole thing out, because it was only there to deal with different memory orderings (whereas TensorFlow tensors are always contiguous and always have standard C memory ordering)

Done! I refactored the functional_layernorm function to handle alternate axes, and then just called that instead of this manual layernorm. Model output is unchanged and all integration tests still pass.

sgugger · 2023-05-02T17:16:24Z

src/transformers/models/sam/modeling_tf_sam.py

Replace this by?

Clarified that comment!

sgugger · 2023-05-02T17:16:42Z

src/transformers/models/sam/modeling_tf_sam.py

To address.

I never figured this out, but it's the same in Torch, and both models give equivalent outputs. @ArthurZucker do you know why this weight is non-trainable?

Couldn't find any reference to this random embedding in the paper (in fact, the paper always mentions learned positional embeddings), but the same pattern is in the SAM codebase

This meme is all I can think of

Rocketknight1 · 2023-05-03T13:07:13Z

Thanks for the review - about half of the comments relate to the processor code, which is definitely in need of a refactor, yes. Working on that now!

amyeroberts

Looking good!

Left some general comments - mainly wrt the processing code. I'd like for there to be as little TF/PT specific code if possible. For postprocessing it's OK, as a lot of postprocessing is still pytorch specific but for preprocessing it should be (as much as possible) framework agnostic.

For the processor, can you add pt_tf cross checks to make sure that TF postprocessed outputs are equivalent to the PT ones?

tests/models/sam/test_modeling_tf_sam.py

amyeroberts · 2023-05-02T17:44:05Z

src/transformers/models/sam/image_processing_sam.py


        return output_masks

+    def post_process_masks_tf(


Preprocessing are all in numpy - this hasn't been extended to postprocessing methods yet. Mainly because I haven't dared tackle torch.nn.functional.interpolate; partly because we haven't needed to yet.

That said - please don't have post_processing_xxx_tf! We don't use decode_tf for our tokenizers ;)

Could you rework the methods so there's a single post_process_xxx method and hidden framework-specifc methods? i.e.

def post_process_masks(self, masks, ...,): if is_torch_tensor(masks): return self._post_process_masks_pt(...) if is_tf_tensor(masks): return self._post_process_masks_tf(...) ...

amyeroberts · 2023-05-03T14:03:32Z

tests/models/vit_mae/test_modeling_tf_vit_mae.py

    # overwrite from common since TFViTMAEForPretraining has random masking, we need to fix the noise
    # to generate masks during test
-    def check_pt_tf_models(self, tf_model, pt_model, tf_inputs_dict):
+    def check_pt_tf_models(self, tf_model, pt_model, tf_inputs_dict, tol=1e-5):


Do you need to add the tol argument here? Unless necessary, I'd avoid resetting the tol default in all the methods so we only need to update in one place

I refactored this and reverted all the changes in the common tests

src/transformers/models/sam/image_processing_sam.py

src/transformers/models/sam/modeling_tf_sam.py

amyeroberts · 2023-05-03T14:23:14Z

src/transformers/models/sam/modeling_tf_sam.py

+            if output_hidden_states:
+                vision_hidden_states = vision_outputs[1]
+            if output_attentions:
+                vision_attentions = vision_outputs[-1]


Could we instead pass in return_dict=True to self.vision_encoder and then explicitly access the values from the names? I'm not a big fan of accessing from indexes here

Done! (Also changed in the original PT code)

amyeroberts · 2023-05-03T14:25:14Z

src/transformers/models/sam/modeling_tf_sam.py

+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,


Why have these arguments?

I'm not clear about this one! Aren't these arguments common across most of our models?

I don't think so? Only SAM has get_image_embeddings and all other get_xxx_embeddings as far as I can tell just take self

amyeroberts · 2023-05-03T14:36:16Z

src/transformers/models/sam/modeling_tf_sam.py

+
+        # Matt: The original Torch code checked that the sum of sparse_prompt_embeddings equalled 0. However, this only
+        #       happens when the sparse prompt embeddings are an empty tensor with shape[1] == 0. I replaced
+        #       it with an explicit shape check to avoid data-dependent control flow which breaks XLA.


Rocketknight1 · 2023-05-04T18:08:27Z

@amyeroberts @sgugger I refactored all the changes to the common tests, and just overrode check_pt_tf_outputs to change the tol in the tests instead - this is much cleaner and resolves most of the issues there. I also refactored the processor, removing the duplicated files and merging methods where appropriate. I think all comments have now been addressed!

gante

Looking good! 💪

Additional general comment: it seems like it is missing the Keras training argument all around (call and in the dropout layers)... but on the other hand, SAM is not trainable. Still, in case we add a training script, I'd add this quick future-proof change :D

src/transformers/models/sam/modeling_tf_sam.py

gante · 2023-05-09T14:31:56Z

src/transformers/models/sam/modeling_tf_sam.py

Couldn't find any reference to this random embedding in the paper (in fact, the paper always mentions learned positional embeddings), but the same pattern is in the SAM codebase

This meme is all I can think of

src/transformers/models/sam/modeling_tf_sam.py

tests/models/sam/test_modeling_tf_sam.py

Rocketknight1 · 2023-05-11T17:40:28Z

@gante I think all comments are now addressed, and I added training wherever it touched a layer that had training-specific behaviour (which is literally one dropout call)

All comments from @amyeroberts and @sgugger should be addressed too - are you okay with going ahead and merging now once tests pass?

sgugger

Thanks for all the work on this. @amyeroberts could you also have a look before this is merged?

sgugger · 2023-05-16T14:24:59Z

src/transformers/models/sam/modeling_sam.py

+            points (`torch.Tensor`, **optional**):
                point coordinates and labels to embed.
-            boxes (`torch.Tensor`, **optionnal**):
+            boxes (`torch.Tensor`, **optional**):
                boxes to embed
-            masks (`torch.Tensor`, **optionnal**):
+            masks (`torch.Tensor`, **optional**):


Since we are touching this, can you put the optionals in italics and not bold ?

sgugger · 2023-05-16T14:25:40Z

src/transformers/models/sam/modeling_sam.py

-                return_dict=return_dict,
+                return_dict=True,


This cannot be forced as return_dict breaks jit compilation. This change needs reverting.

My bad - this was my suggestion, sorry @Rocketknight1!

sgugger · 2023-05-16T14:27:26Z

src/transformers/models/sam/modeling_tf_sam.py

+    values.
+    """
+
+    def __init__(self, config, downsample_rate=None, **kwargs) -> None:


The -> None make zero sense to me as a type annotation (I know it's what PEP says, but the init returns an instance of the class). Since there are no type annotations elsewhere, maybe just remove it?

Done! (for all classes across both the PT and TF files)

src/transformers/models/sam/modeling_tf_sam.py

amyeroberts

Nice! 🔥

Thanks for iterating, and in particular for spending the time to add equivalence tests for the processor and keep the image processing code tidy with the two frameworks 🤗

amyeroberts · 2023-05-09T15:36:06Z

tests/models/sam/test_processor_sam.py

+
+        self.assertTrue(np.all(tf_masks[0].numpy() == pt_masks[0].numpy()))
+
+    def test_image_processor_equivalence(self):


amyeroberts · 2023-05-16T15:02:35Z

src/transformers/models/sam/modeling_sam.py

-                return_dict=return_dict,
+                return_dict=True,


My bad - this was my suggestion, sorry @Rocketknight1!

tests/models/sam/test_processor_sam.py

amyeroberts · 2023-05-16T15:16:03Z

tests/models/sam/test_modeling_tf_sam.py

This tolerance seems pretty high 👀

It's actually okay - the values for the scores are very large (usually in the range 5-30). A tolerance of 2e-4 for numbers that big is quite tight!

amyeroberts · 2023-05-16T15:20:51Z

tests/models/sam/test_modeling_tf_sam.py

Note in #23376 - input_boxes should be a list of list of ints.

amyeroberts · 2023-05-16T15:23:32Z

src/transformers/models/sam/image_processing_sam.py

I know this is just copying from the PT implementation - but it would be great to add to the docstring info about what's returned as there's many objects

I'll be honest that I don't understand it too well, lol. I'll leave that for a follow-up on the Torch end and copy the strings whenever they do it 😅

amyeroberts · 2023-05-16T15:35:30Z

src/transformers/models/sam/modeling_tf_sam.py

+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,


I don't think so? Only SAM has get_image_embeddings and all other get_xxx_embeddings as far as I can tell just take self

amyeroberts · 2023-05-16T15:37:59Z

src/transformers/models/sam/modeling_tf_sam.py

latyer norm layer here should take eps from config

amyeroberts · 2023-05-16T15:38:19Z

src/transformers/models/sam/modeling_tf_sam.py

layer norm layers here should take eps from config

The PyTorch version doesn't, and just uses the 1e-6 default kwarg value!

amyeroberts · 2023-05-16T15:51:15Z

src/transformers/models/sam/image_processing_sam.py

Let's hope it's not too experimental 😬

tnp has been around since 2.4, I think we're safe!

ha! for TF I doubt it ;)

Co-authored-by: Sylvain Gugger <[email protected]>

Co-authored-by: amyeroberts <[email protected]>

Rocketknight1 · 2023-05-17T12:47:46Z

I think comments are addressed now - are we okay to merge?

Rocketknight1 · 2023-05-19T13:13:26Z

I'm treating silence as agreement, merging!

sayakpaul reviewed Apr 26, 2023

View reviewed changes

Rocketknight1 force-pushed the tf_sam_port branch from b9dd5a4 to b1f61bd Compare April 26, 2023 13:43

Rocketknight1 marked this pull request as ready for review April 28, 2023 18:17

Rocketknight1 requested review from ArthurZucker, gante and sgugger May 2, 2023 15:54

Rocketknight1 force-pushed the tf_sam_port branch from beedcab to 38f3d35 Compare May 2, 2023 17:18

sgugger reviewed May 2, 2023

View reviewed changes

amyeroberts reviewed May 3, 2023

View reviewed changes

gante reviewed May 9, 2023

View reviewed changes

gante approved these changes May 10, 2023

View reviewed changes

Rocketknight1 force-pushed the tf_sam_port branch from 76cebb9 to 17536e4 Compare May 11, 2023 17:38

sgugger approved these changes May 16, 2023

View reviewed changes

amyeroberts approved these changes May 16, 2023

View reviewed changes

Rocketknight1 added 10 commits May 16, 2023 18:09

First commit

cd8df3a

Add auto-translation with GPT-4

eb103df

make fixup

0a611f1

Add a functional layernorm for TF

5067b60

Add all the auxiliary imports etc.

75eb390

Add the extra processor and tests

6b38c83

rebase to main

ebc235c

Add all the needed fixes to the GPT code

b969a9d

make fixup

d3b1392

Make convolutions channels-last so they run on CPU

9c3066b

Rocketknight1 and others added 17 commits May 16, 2023 18:09

Revert changes to common tests and just override check_pt_tf_outputs

ce669d4

Revert changes to other model tests

0ff4dc7

Clarify comments for functional layernorm

34bca0f

Add missing transpose from PT code

74f3291

Removed unused copied from in PT code

fcfef1f

Remove overrides for tests that don't exist in TF

392486d

Fix transpose and update tests for PT and TF to check pred_masks

b7f9dd4

Add training flag

4f966c8

Update tests to use TF checkpoints

f29b109

Update index.mdx

792f376

Add missing cross-test decorator

7249cb5

Remove optional extra asterisks

28dac3e

Revert return_dict changes in PT code

bbc6886

Update src/transformers/models/sam/modeling_tf_sam.py

79d2b81

Co-authored-by: Sylvain Gugger <[email protected]>

Remove None return annotations on init methods

3d59612

Update tests/models/sam/test_processor_sam.py

ee4057f

Co-authored-by: amyeroberts <[email protected]>

Fix input_boxes shapes

3902969

Rocketknight1 force-pushed the tf_sam_port branch from 875cc35 to 3902969 Compare May 16, 2023 17:10

make fixup

07813f0

Rocketknight1 merged commit 1c460a5 into main May 19, 2023

Rocketknight1 deleted the tf_sam_port branch May 19, 2023 13:14

ydshieh mentioned this pull request May 23, 2023

[SAM] Fixes pipeline and adds a dummy pipeline test #23684

Merged

amyeroberts mentioned this pull request Jul 11, 2023

Regarding pix2struct's bugs in the attentions of the encoder's and decoder's outputs #23985

Closed

amyeroberts mentioned this pull request Sep 11, 2023

Return effective attention mask in Wav2Vec2BaseModelOutput #25471

Closed

5 tasks

Rocketknight1 mentioned this pull request Oct 20, 2023

Port IDEFICS to tensorflow #26870

Merged

5 tasks


		self.assertTrue(np.all(tf_masks[0].numpy() == pt_masks[0].numpy()))

		def test_image_processor_equivalence(self):

TF port of the Segment Anything Model (SAM) #22970

TF port of the Segment Anything Model (SAM) #22970

Uh oh!

Conversation

Rocketknight1 commented Apr 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented May 3, 2023

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rocketknight1 commented Apr 24, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 24, 2023 •

edited

Loading

Rocketknight1 commented May 2, 2023 •

edited

Loading